{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multilayer perceptrons in `gluon`\n",
    "\n",
    "Building a multilayer perceptron to classify MNIST images with `gluon` is not much harder\n",
    "than [implementing softmax regression with `gluon`](../chapter02_supervised-learning/softmax-regression-gluon.ipynb), like we did in Chapter 2.\n",
    "In that chapter, our entire neural network consisted \n",
    "of one Dense layer (`net = gluon.nn.Dense(num_outputs)`).\n",
    "\n",
    "In this chapter, we're going to show you \n",
    "how to compose multiple layers together \n",
    "into a neural network.\n",
    "There are two main ways to do this in Gluon and we'll walk through both. \n",
    "The first is to define a custom `Block`.\n",
    "In Gluon, everything is a Block! \n",
    "Layers, losses, whole networks, they're all blocks!\n",
    "So naturally, that's a flexible way to do nearly anything you want. \n",
    "\n",
    "We'll also make use of `gluon.nn.Sequential`.\n",
    "Sequential gives us a special way of rapidly building networks\n",
    "when follow a common design pattern: they look like a stack of pancakes.\n",
    "Many networks follow this pattern:\n",
    "a bunch of layers, one stacked on top of another,\n",
    "where the output of each layer is the input to the next layer.\n",
    "Sequential just takes a list of layers (we pass them in by calling `net.add(<Layer goes here!>)`.\n",
    "The following unnecessary picture should give you an intuitive sense of when to (and not to) use sequential.\n",
    "\n",
    "![](../img/sequential-not-sequential-layers.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imports\n",
    "First we'll import the necessary bits."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "import numpy as np\n",
    "import mxnet as mx\n",
    "from mxnet import nd, autograd, gluon"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll also want to set the contexts for our data and our models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()\n",
    "data_ctx = ctx\n",
    "model_ctx = ctx"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The MNIST dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "batch_size = 64\n",
    "num_inputs = 784\n",
    "num_outputs = 10\n",
    "num_examples = 60000\n",
    "def transform(data, label):\n",
    "    return data.astype(np.float32)/255, label.astype(np.float32)\n",
    "train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=transform),\n",
    "                                      batch_size, shuffle=True)\n",
    "test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform),\n",
    "                                     batch_size, shuffle=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define the model with `gluon.Block`\n",
    "\n",
    "Now instead of having one `gluon.nn.Dense` layer, we'll want to compose several together. \n",
    "First let's go through the most fundamental way of doing this. Then we'll introduce some shortcuts.\n",
    "In `gluon` a Block has one main job - define a `forward` method that takes some NDArray input `x` and generates an NDArray output. \n",
    "Because the output and input are related to each other via NDArray operations, MXNet can take derivatives through the block automatically. \n",
    "A Block can just do something simple like apply an activation function. \n",
    "But it can also combine a bunch of other Blocks together in creative ways.\n",
    "In this case, we'll just want to instantiate three Dense layers. \n",
    "The forward can then invoke the layers in turn to generate its output. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MLP(gluon.Block):\n",
    "    def __init__(self, **kwargs):\n",
    "        super(MLP, self).__init__(**kwargs)\n",
    "        with self.name_scope():\n",
    "            self.dense0 = gluon.nn.Dense(64)\n",
    "            self.dense1 = gluon.nn.Dense(64)\n",
    "            self.dense2 = gluon.nn.Dense(10)\n",
    "        \n",
    "    def forward(self, x):\n",
    "        x = nd.relu(self.dense0(x))\n",
    "        x = nd.relu(self.dense1(x))\n",
    "        x = self.dense2(x)\n",
    "        return x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now instantiate a multilayer perceptron using our MLP class.\n",
    "And just as with any other block, we can grab its parameters with `collect_params` and initialize them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "net = MLP()\n",
    "net.collect_params().initialize(mx.init.Normal(sigma=.01), ctx=model_ctx)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we can synthesize some gibberish data just to demonstrate one forward pass through the network. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = nd.ones((1,784))\n",
    "net(data.as_in_context(model_ctx))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because we're working with an imperative framework and not a symbolic framework, debugging Gluon Blocks is easy. If we want to see what's going on at each layer of the neural network, we can just plug in a bunch of Python print statements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MLP(gluon.Block):\n",
    "    def __init__(self, **kwargs):\n",
    "        super(MLP, self).__init__(**kwargs)\n",
    "        with self.name_scope():\n",
    "            self.dense0 = gluon.nn.Dense(64, activation=\"relu\")\n",
    "            self.dense1 = gluon.nn.Dense(64, activation=\"relu\")\n",
    "            self.dense2 = gluon.nn.Dense(10)\n",
    "        \n",
    "    def forward(self, x):\n",
    "        x = self.dense0(x)\n",
    "        print(\"Hidden Representation 1: %s\" % x)\n",
    "        x = self.dense1(x)\n",
    "        print(\"Hidden Representation 2: %s\" % x)\n",
    "        x = self.dense2(x)\n",
    "        print(\"Network output: %s\" % x)\n",
    "        return x\n",
    "\n",
    "net = MLP()\n",
    "net.collect_params().initialize(mx.init.Normal(sigma=.01), ctx=model_ctx)\n",
    "net(data.as_in_context(model_ctx))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Faster modeling with `gluon.nn.Sequential`\n",
    "\n",
    "MLPs, like many deep neural networks follow a pretty boring architecture. \n",
    "Just take a list of the layers, chain them together, and return the output.\n",
    "There's no reason why we have to actually define a new class every time we want to do this. \n",
    "Gluon's `Sequential` class provides a nice way of rapidly implementing this standard network architecture.\n",
    "We just \n",
    "\n",
    "* Instantiate a Sequential (let's call it `net`) \n",
    "* Add a bunch of layers to it using `net.add(...)`\n",
    "\n",
    "Sequential assumes that the layers arrive bottom to top (with input at the very bottom).\n",
    "We could implement the same architecture as shown above using sequential in just 6 lines."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_hidden = 64\n",
    "net = gluon.nn.Sequential()\n",
    "with net.name_scope():\n",
    "    net.add(gluon.nn.Dense(num_hidden, activation=\"relu\"))\n",
    "    net.add(gluon.nn.Dense(num_hidden, activation=\"relu\"))\n",
    "    net.add(gluon.nn.Dense(num_outputs))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Parameter initialization\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "net.collect_params().initialize(mx.init.Normal(sigma=.1), ctx=model_ctx)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Softmax cross-entropy loss"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimizer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': .01})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluation metric"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def evaluate_accuracy(data_iterator, net):\n",
    "    acc = mx.metric.Accuracy()\n",
    "    for i, (data, label) in enumerate(data_iterator):\n",
    "        data = data.as_in_context(model_ctx).reshape((-1, 784))\n",
    "        label = label.as_in_context(model_ctx)\n",
    "        output = net(data)\n",
    "        predictions = nd.argmax(output, axis=1)\n",
    "        acc.update(preds=predictions, labels=label)\n",
    "    return acc.get()[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training loop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "epochs = 10\n",
    "smoothing_constant = .01\n",
    "\n",
    "for e in range(epochs):\n",
    "    cumulative_loss = 0\n",
    "    for i, (data, label) in enumerate(train_data):\n",
    "        data = data.as_in_context(model_ctx).reshape((-1, 784))\n",
    "        label = label.as_in_context(model_ctx)\n",
    "        with autograd.record():\n",
    "            output = net(data)\n",
    "            loss = softmax_cross_entropy(output, label)\n",
    "        loss.backward()\n",
    "        trainer.step(data.shape[0])\n",
    "        cumulative_loss += nd.sum(loss).asscalar()\n",
    "\n",
    "\n",
    "    test_accuracy = evaluate_accuracy(test_data, net)\n",
    "    train_accuracy = evaluate_accuracy(train_data, net)\n",
    "    print(\"Epoch %s. Loss: %s, Train_acc %s, Test_acc %s\" %\n",
    "          (e, cumulative_loss/num_examples, train_accuracy, test_accuracy))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "In this chapter, we showed two ways to build multilayer perceptrons with Gluon. We demonstrated how to subclass `gluon.Block`, and define your own forward passes. We also showed how you might debug your network by lacing your forward pass with print statements. Finally, we showed how you could define and instantiate an equivalent network with just 6 lines of code by using `gluon.nn.Sequential`. Now that you understand the basics, you're ready to leap ahead. If you're following the book in order, then the next stop will be [dropout regularization](../chapter03_deep-neural-networks/mlp-dropout-scratch.ipynb). Other possible choices would be to start leanring about [convolutional neural networks](../chapter04_convolutional-neural-networks/cnn-scratch.ipynb) which are especialy handy for working with images, or [recurrent neural networks](../chapter05_recurrent-neural-networks/simple-rnn.ipynb), which are especially useful for natural language processing."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next\n",
    "[Dropout regularization from scratch](../chapter03_deep-neural-networks/mlp-dropout-scratch.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For whinges or inquiries, [open an issue on  GitHub.](https://github.com/zackchase/mxnet-the-straight-dope)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  },
  "widgets": {
   "state": {},
   "version": "1.1.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
